Automatic Semantics Using Google

نویسندگان

  • Rudi Cilibrasi
  • Paul Vitanyi
چکیده

We have found a method to automatically extract the meaning of words and phrases from the world-wide-web using Google page counts. The approach is novel in its unrestricted problem domain, simplicity of implementation, and manifestly ontological underpinnings. The world-wide-web is the largest database on earth, and the latent semantic context information entered by millions of independent users averages out to provide automatic meaning of useful quality. We demonstrate positive correlations, evidencing an underlying semantic structure, in both numerical symbol notations and number-name words in a variety of natural languages and contexts. Next, we demonstrate the ability to distinguish between colors and numbers, and to distinguish between 17th century Dutch painters; the ability to understand electrical terms, religious terms, emergency incidents, and we conduct a massive experiment in understanding WordNet categories; the ability to do a simple automatic English-Spanish translation.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Automatic Meaning Discovery Using Google

We present a new theory of relative semantics between objects, based on information distance and Kolmogorov complexity. This theory is then applied to construct a method to automatically extract the meaning of words and phrases from the world-wide-web using Google page counts. The approach is novel in its unrestricted problem domain, simplicity of implementation, and manifestly ontological unde...

متن کامل

Automatic Hashtag Recommendation in Social Networking and Microblogging Platforms Using a Knowledge-Intensive Content-based Approach

In social networking/microblogging environments, #tag is often used for categorizing messages and marking their key points. Also, since some social networks such as twitter apply restrictions on the number of characters in messages, #tags can serve as a useful tool for helping users express their messages. In this paper, a new knowledge-intensive content-based #tag recommendation system is intr...

متن کامل

RESTDoc: Describe, Discover and Compose RESTful Semantic Web Services using Annotated Documentations

Development of a semantic web is gaining a lot of traction recently. At the same time, another change is also getting a lot popular on the web a move from complex SOAP based web services to the simpler RESTful services that work over the existing HTTP infrastructure. Various techniques had been proposed to add semantics to RESTful services. But most of these solutions suffer from the fact that ...

متن کامل

From Strings to Things SAR-Graphs: A New Type of Resource for Connecting Knowledge and Language

Recent research and development have created the necessary ingredients for a major push in web-scale language understanding: large repositories of structured knowledge (DBpedia, the Google knowledge graph, Freebase, YAGO) progress in language processing (parsing, information extraction, computational semantics), linguistic knowledge resources (Treebanks, WordNet, BabelNet, UWN) and new powerful...

متن کامل

Exploiting Syntactic and Shallow Semantic Kernels for Question Answer Classification

We study the impact of syntactic and shallow semantic information in automatic classification of questions and answers and answer re-ranking. We define (a) new tree structures based on shallow semantics encoded in Predicate Argument Structures (PASs) and (b) new kernel functions to exploit the representational power of such structures with Support Vector Machines. Our experiments suggest that s...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2004